9 research outputs found
Towards More Robust Natural Language Understanding
Natural Language Understanding (NLU) is a branch of Natural Language Processing (NLP) that uses intelligent computer software to understand texts that encode human knowledge. Recent years have witnessed notable progress across various NLU tasks with deep learning techniques, especially with pretrained language models. Besides proposing more advanced model architectures, constructing more reliable and trustworthy datasets also plays a huge role in improving NLU systems, without which it would be impossible to train a decent NLU model. It's worth noting that the human ability of understanding natural language is flexible and robust. On the contrary, most of existing NLU systems fail to achieve desirable performance on out-of-domain data or struggle on handling challenging items (e.g., inherently ambiguous items, adversarial items) in the real world. Therefore, in order to have NLU models understand human language more effectively, it is expected to prioritize the study on robust natural language understanding.
In this thesis, we deem that NLU systems are consisting of two components: NLU models and NLU datasets. As such, we argue that, to achieve robust NLU, the model architecture/training and the dataset are equally important. Specifically, we will focus on three NLU tasks to illustrate the robustness problem in different NLU tasks and our contributions (i.e., novel models and new datasets) to help achieve more robust natural language understanding. The major technical contributions of this thesis are:
(1) We study how to utilize diversity boosters (e.g., beam search & QPP) to help neural question generator synthesize diverse QA pairs, upon which a Question Answering (QA) system is trained to improve the generalization on the unseen target domain. It's worth mentioning that our proposed QPP (question phrase prediction) module, which predicts a set of valid question phrases given an answer evidence, plays an important role in improving the cross-domain generalizability for QA systems. Besides, a target-domain test set is constructed and approved by the community to help evaluate the model robustness under the cross-domain generalization setting.
(2) We investigate inherently ambiguous items in Natural Language Inference, for which annotators don't agree on the label. Ambiguous items are overlooked in the literature but often occurring in the real world. We build an ensemble model, AAs (Artificial Annotators), that simulates underlying annotation distribution to effectively identify such inherently ambiguous items. Our AAs are better at handling inherently ambiguous items since the model design captures the essence of the problem better than vanilla model architectures.
(3) We follow a standard practice to build a robust dataset for FAQ retrieval task, COUGH. In our dataset analysis, we show how COUGH better reflects the challenge of FAQ retrieval in the real situation than its counterparts. The imposed challenge will push forward the boundary of research on FAQ retrieval in real scenarios.
Moving forward, the ultimate goal for robust natural language understanding is to build NLU models which can behave humanly. That is, it's expected that robust NLU systems are capable to transfer the knowledge from training corpus to unseen documents more reliably and survive when encountering challenging items even if the system doesn't know a priori of users' inputs.No embargoAcademic Major: Computer Science and EngineeringAcademic Major: Industrial and Systems Engineerin
Generative Entity-to-Entity Stance Detection with Knowledge Graph Augmentation
Stance detection is typically framed as predicting the sentiment in a given
text towards a target entity. However, this setup overlooks the importance of
the source entity, i.e., who is expressing the opinion. In this paper, we
emphasize the need for studying interactions among entities when inferring
stances. We first introduce a new task, entity-to-entity (E2E) stance
detection, which primes models to identify entities in their canonical names
and discern stances jointly. To support this study, we curate a new dataset
with 10,619 annotations labeled at the sentence-level from news articles of
different ideological leanings. We present a novel generative framework to
allow the generation of canonical names for entities as well as stances among
them. We further enhance the model with a graph encoder to summarize entity
activities and external knowledge surrounding the entities. Experiments show
that our model outperforms strong comparisons by large margins. Further
analyses demonstrate the usefulness of E2E stance detection for understanding
media quotation and stance landscape, as well as inferring entity ideology.Comment: EMNLP'22 Main Conferenc
Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and Analysis
Prior work on ideology prediction has largely focused on single modalities,
i.e., text or images. In this work, we introduce the task of multimodal
ideology prediction, where a model predicts binary or five-point scale
ideological leanings, given a text-image pair with political content. We first
collect five new large-scale datasets with English documents and images along
with their ideological leanings, covering news articles from a wide range of US
mainstream media and social media posts from Reddit and Twitter. We conduct
in-depth analyses of news articles and reveal differences in image content and
usage across the political spectrum. Furthermore, we perform extensive
experiments and ablation studies, demonstrating the effectiveness of targeted
pretraining objectives on different model components. Our best-performing
model, a late-fusion architecture pretrained with a triplet objective over
multimodal content, outperforms the state-of-the-art text-only model by almost
4% and a strong multimodal baseline with no pretraining by over 3%.Comment: EMNLP 202
Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting
News media is expected to uphold unbiased reporting. Yet they may still
affect public opinion by selectively including or omitting events that support
or contradict their ideological positions. Prior work in NLP has only studied
media bias via linguistic style and word usage. In this paper, we study to
which degree media balances news reporting and affects consumers through event
inclusion or omission. We first introduce the task of detecting both partisan
and counter-partisan events: events that support or oppose the author's
political ideology. To conduct our study, we annotate a high-quality dataset,
PAC, containing 8,511 (counter-)partisan event annotations in 304 news articles
from ideologically diverse media outlets. We benchmark PAC to highlight the
challenges of this task. Our findings highlight both the ways in which the news
subtly shapes opinion and the need for large language models that better
understand events within a broader context. Our dataset can be found at
https://github.com/launchnlp/Partisan-Event-Dataset.Comment: EMNLP'23 Finding
All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison
Public opinion is shaped by the information news media provide, and that
information in turn may be shaped by the ideological preferences of media
outlets. But while much attention has been devoted to media bias via overt
ideological language or topic selection, a more unobtrusive way in which the
media shape opinion is via the strategic inclusion or omission of partisan
events that may support one side or the other. We develop a latent
variable-based framework to predict the ideology of news articles by comparing
multiple articles on the same story and identifying partisan events whose
inclusion or omission reveals ideology. Our experiments first validate the
existence of partisan event selection, and then show that article alignment and
cross-document comparison detect partisan events and article ideology better
than competitive baselines. Our results reveal the high-level form of media
bias, which is present even among mainstream media with strong norms of
objectivity and nonpartisanship. Our codebase and dataset are available at
https://github.com/launchnlp/ATC.Comment: EMNLP'23 Main Conferenc
You Are What You Annotate: Towards Better Models through Annotator Representations
Annotator disagreement is ubiquitous in natural language processing (NLP)
tasks. There are multiple reasons for such disagreements, including the
subjectivity of the task, difficult cases, unclear guidelines, and so on.
Rather than simply aggregating labels to obtain data annotations, we instead
try to directly model the diverse perspectives of the annotators, and
explicitly account for annotators' idiosyncrasies in the modeling process by
creating representations for each annotator (annotator embeddings) and also
their annotations (annotation embeddings). In addition, we propose TID-8, The
Inherent Disagreement - 8 dataset, a benchmark that consists of eight existing
language understanding datasets that have inherent annotator disagreement. We
test our approach on TID-8 and show that our approach helps models learn
significantly better from disagreements on six different datasets in TID-8
while increasing model size by fewer than 1% parameters. By capturing the
unique tendencies and subjectivity of individual annotators through embeddings,
our representations prime AI models to be inclusive of diverse viewpoints.Comment: Accepted to Findings of EMNLP 202
Identifying inherent disagreement in natural language inference
Natural language inference (NLI) is the task of determining whether a piece of text is entailed, contradicted by or unrelated to another piece of text. In this paper, we investigate how to tease systematic inferences (i.e., items for which people agree on the NLI label) apart from disagreement items (i.e., items which lead to different annotations), which most prior work has overlooked. To distinguish systematic inferences from disagreement items, we propose Artificial Annotators (AAs) to simulate the uncertainty in the annotation process by capturing the modes in annotations. Results on the CommitmentBank, a corpus of naturally occurring discourses in English, confirm that our approach performs statistically significantly better than all baselines. We further show that AAs learn linguistic patterns and context-dependent reasoning